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REPEATED  LIKELIHOOD  RATIO  TESTS 
FOR  CURVED  EXPONENTIAL  FAMILIES 


1 .  Introduct Ion 

tfe  study  here  the  significance  levels  of  repeated  likelihood 
ratio  tests  for  nested  hypotheses  in  multiparameter  exponential 
families.  In  various  hypothesis  testing  situations,  the  peculiar 
constraints  of  medical  research  (and  the  even  more  peculiar  con¬ 
straints  of  non-medical  research)  occasionally  preclude  the  determin¬ 
ation  of  a  sample  size  in  advance  of  experimental  results:  various 
authors  (notably  Armitage  [1],  [2],  and  Schwartz  [11])  have  argued 
that  in  many  such  cases  a  reasonable  option  is  provided  by  certain 
simple  stopping  rules  based  on  the  behavior  of  a  (generalized)  like¬ 
lihood  ratio  statistic.  Unfortunately,  determining  the  operating 
characteristics  of  such  procedures  remains  a  difficult  issue, 
although  in  recent  years  important  advances  have  been  made  by 
Woodroofe  [14],  [15],  [16],  Siegmund  [12],  and  Lai  &  Siegmund  [9], 
[10]. 

Let  (Pg,  9  eft)  be  an  exponential  family  of  probability 
measures  on  IR*> : 

(1.1)  (dP0/dPo)  (x)  -  exp{0Tx -i[>(0) )  . 

p 

The  natural  parameter  space  ft  is  assumed  to  be  an  open  subset  of  TR  , 
and  \p  la  assumed  to  be  strictly  convex  on  ft.  Suppose  that  ft^  is  a 
smooth  relatively  closed  q^-dimensional  submanifold  of  ft,  and  that 
ftg  is  a  smooth  relatively  closed  qg-dimensional  submanifold  of  ft^, 


Mi.  )if'  -  - 
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where  0  <  _<  p:  these  are  to  be  the  null  and  alternative 

hypotheses,  respectively.  (In  the  terminology  of  Efron  [5], 

(P0  :  are  "curved  exponential  families.")  Let  X^.X,,,...  be 

i.i.d.  from  P g,  and  let  e  n  be  the  generalized  likelihood  ratio  sta¬ 
tistic  for  testing 

HQ  :  0  £  v .  :  0  e 

The  repeated  likelihood  ratio  test  will  be  based  on  the  stopping  rule 
T  A  m. ,  where 

(1.2)  T  »  T  ■  min(n  >mrt  :  A  >a}  ; 

a  —  u  n 

if  T  m^,  Hq  should  be  rejected,  whereas  if  T  >  m^,  should  not  be 
rejected. 

The  main  result  of  this  work  is  that  for  mQ  ~  a  and 
m.^  ~  a  ,  and  0Q  E 

(q!-qn)/2 

(1.3)  PQ  (Ta  <  n^}  ~  C  a  1  0  e 

as  a  +  ®,  provided  a  certain  host  of  regularity  conditions  are 
satisfied.  The  constant  C,  which  depends  on  0q,  6^,  and  €2»  will 
take  the  unpleasant  form  of  a  surface  integral  in  ]RP  ,  which  may, 
however,  be  evaluated  numerically  in  many  cases  of  statistical 
interest . 


2.  Example:  Testing  for  the  Equality  of  Two  Bernoulli  Parameters 


i 

Suppose  we  observe  a  sequence  {(X^,Y^)  :  1*1,2,...  }  of 

2 

1.1. d.  random  vectors  taking  values  In  the  set  {0,1}  ,  with 

e  (1-e.)  (1-e,) 

<2,1>  Tl'*2J‘Pl  P2  W*P1)  a-p2> 

where  e^,e2  £ {0,1} .  The  parameters  and  p2  are  unknown;  we  wish  to 
test  the  hypothesis  ■  p2- 

Imagine  that  the  variables  X^.Y^  are  success  indicators  in  a 
clinical  trial.  Patients  suffering  from  a  particular  disorder  arrive 
infrequently  at  a  clinic  where  they  may  be  treated  according  to  one 
of  two  procedures:  because  of  the  nature  of  the  disorder  the 
patients  must  be  treated  Immediately ,  and  a  response  (success  or 
failure)  is  apparent  within  a  relatively  short  period  of  time 
(compared  to  interarrival  times).  If  the  disorder  is  serious, 
sequential  experimentation  to  compare  the  efficacies  of  the  two  pro¬ 
cedures  may  be  appropriate. 

Such  a  situation  was  considered  by  Siegmund  and  Gregory  [13], 
who  proposed  several  sequential  procedures  for  testing  the  hypothesis 
p^  *  P2«  One  of  these  was  a  sequential  version  of  the  generalized 
likelihood  ratio  test,  which  had  previously  been  studied  in  different 
contexts  by  Armitage  [1],  [2],  Schwartz  [11],  Siegmund  [12],  and 
Woodroofe  [15],  [16].  This  test  is  easily  described.  Let 
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X 

n 


The  variable  *n  is  the  logarithm  of  the  generalized  likelihood  ratio 
statistic,  which  is  commonly  employed  as  a  test  statistic  in  fixed 
sample  procedures.  In  using  it  as  the  basis  for  a  sequential  test, 
one  observes  pairs  (X^.Y^)  until  the  time  T  *  m^  (m^  being  some  fixed 
patient  horizon),  rejecting  the  hypothesis  p^  =  p^  iff  T  £  m^. 

The  problem  of  computing  significance  levels  and  power  func¬ 
tions  for  the  test  procedure  just  described  is  not  nearly  so  easy  as 
for  the  fixed-sample  generalized  likelihood  ratio  test,  whose  asymp¬ 
totic  theory  has  been  thoroughly  developed.  Siegmund  and  Gregory 
[13]  have  derived  heuristically  an  asymptotic  formula  for  the  Type  I 
error  probability;  their  formula  agrees  formally  with  a  result  of 
Woodroofe  [15]  which  was  proved  under  assumptions  too  stringent  to 
include  this  problem  as  an  admissible  case.  This  formula  is  con¬ 
tained  in  Theorem  1  below. 
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THEOREM  1.  Let  and  ^  be  fixed  constants  such  that 

(2  log  2)  1  <  Suppose  that  mQ  -  a6^  +  o(a)  and 

m^  =  a^  +  o(a)  (recall  that  =  min(n  >mQ  :  A^  >a})  .  Then  as  a  +» 

(2.3)  a"1/2  ea  Pptp(Ta<^}  -  C(?i^) 

for  every  p  e  (0,1)  .  For  0  <  p  <  1/2,  C(p;  t^^)  -  C(2p  -1/2;  e^^) 
and 

(2.4)  C(p;  6^,  €^) 

-  *"1/2  v(c,2P  -  o  [i(c.2P  - o  r1/2 

CG(0,2p)  n  {Cs^^KC^p-O^1} 

•  [p(l  -  P)/5(l  -  C) (2p  -  O (1  +C  -  2p) ]1/2  d£  , 

and 

-(AT-a) 

(2.5)  v(p-i » Po)  =  lim  E  e 

a-*»  pl,p2 

That  the  limit  in  (2.5)  exists  (except  for  a  countable  set  of 
(pl»p2)  for  which  p^  ?  p^)  is  a  consequence  of  Theorem  1  of  Lai  and 
Siegmund  [9].  In  fact,  Woodroofe  [16]  has  obtained  an  integral 
formula  for  the  function  \KPpP2)  which  is  explicit  enough  to  allow 
numerical  integration. 

The  restriction  on  the  initial  sample  size  is  rather 

peculiar  and  deserves  some  comment.  Notice  that  the  function 

2 

KppPj)  is  bounded  for  (p^Pj)  e  [0,1]  :  it  achieves  a  maximum  of 
2  log  2  at  the  points  (0,1)  and  (1,0).  Thus  A^  >  a  can  occur  only  if 


n  >  a/2  log  2.  Moreover,  if  >  a  for  some  n  close  to  all  log  2, 

then  (x  ,y  )  must  be  close  to  either  (0,1)  or  (1,0);  since  in  most 
n  n 


conceivable  applications  neither  p^  nor  would  be  close  to  zero  or 
one,  it  would  be  somewhat  unsettling  to  terminate  the  experiment  on 
the  basis  of  such  an  anomolous  sample.  A  larger  initial  sample  size 
protects  against  this  possibility. 

In  the  following  sections  an  analogue  of  Theorem  1  will  be 
formulated  and  proved  under  the  assumption  that  the  observations  are 


from  a  multiparameter  exponential  family.  This  theorem  will  have  one 
major  shortcoming:  namely,  it  will  be  necessary  to  impose  even  more 
stringent  requirements  on  the  initial  sample  size.  (For  the  problem 
discussed  in  this  section,  the  hypotheses  of  Theorem  2  would  require 
>  (log  2)  \  i.e.,  that  the  initial  sample  size  be  twice  as  large 
as  Theorem  1  requires  it  to  be.)  The  mathematical  difficulty  which 
necessitates  the  stronger  conditions  stems  from  the  fact  that  large 
deviations  theorems  need  not  in  general  be  uniform  near  the  "boundary" 
of  an  exponential  family.  Fortunately,  this  difficulty  disappears  in 
many  concrete  cases  of  pt  ,  ..cal  importance:  for  instance,  whenever 
the  mean  parameter  space  is  all  of  IRP;  and  also  in  multinomial 
families . 

We  will  give  a  (somewhat  sketchy)  proof  of  Theorem  1  for 
those  cases  where 


(2.6) 


N^.^P)  *  Ur,  2p  -  r)  :  0  _<  r  £  2p  and 


q1  <  I(r,  2p-r)  <  q1} 
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is  contained  in  the  open  square  (0,1)  *  (0,1).  The  argument  has  two 
steps:  first  we  show  that  only  those  sample  paths  for  which  (x^.y^) 
is  "near"  N  contribute  substantially  to  the  probability  in  (2.3); 
then  we  perform  a  local  analysis  near  N. 

PROPOSITION  1.  As  a  -*■  “ 

(2.7)  Pp  (T^a^1;  dist((xT>yT)  ,N)  >a  log  a}  =  o(a~k  e-a) 
for  every  k  >  0 . 


NOTE:  In  adapting  the  arguments  presented  here  to  the  more  general 

problem  discussed  in  the  following  sections  the  primary  difficulty  is 
in  obtaining  analogues  of  Proposition  1  (cf.  Section  6).  It  is 
because  of  these  difficulties  that  the  more  stringent  assumptions  on 
initial  sample  size  are  necessary. 


PROOF.  This  is  based  on  the  "fundamental  identity  of  sequential 
analysis,"  viz., 

(2.S)  Pp>p(A)  -  Sk  Lt  dQ  -  £  £  <Epi>p2  lft  LT)dPl  dp2  , 
where 

(2.9)  QCB)  -  /q  £  Ppi>p2(B)dPl  dp. 


and 


(2.10) 


0 

n_ 

n_ 

L  -  (n+ir 

nx 

ny„ 

n 

nJ 

n(x  +y  )  n(2-x  -y  ) 

n  •'n  ,,  .  n  n 

P  (1-p) 
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(Here  Q(B)  is  defined  for  all  events  Be  3  ( (X^  ,  (X^.Yj) , ),  and 

(2.8)  holds  for  all  A  in  the  "stopped"  a-algebra  of  events  A  such 


that  A  n  {T<n)e3  ((Xj.Y^  , . . . ,  (Xn,Yn))  for  all  n)  . 

Stirling's  formula  (cf.  Feller  [6],  Chapter  II,  inequality 
(9.15))  provides  a  (crude)  upper  bound  for  L^: 

/  “A  < 

(2.11)  L  <  Cn  e  n  e  n 

n  — 

where 

(2.12)  £n  =  n[2H((xn +yR)/2)  -  (xn +yn)log(p/(l  -  p))  -  2  1og(l-p)] 

for  some  constant  C  >  0.  Now 

H(o>)  -  0)  log(p/(l-p))  -  2  log(l-p) 

is  a  strictly  convex,  smooth,  nonnegative  function  of  we (0,1)  which 
is  zero  for  w  =  p  and  satisfies  H"(u))  >  0.  Thus  there  is  a  constant 
C*  >  0  such  that  (x,y)  e (0,1)2  and 

(2.3)  dist((x,y),  {(r,2p-r)  :  0^r£2p})  >  6 

implies 

(2.14)  2H((x+y)/2)  -  (x +y)log(p/(l  -  p))  -  2  log(l -p)  >  C  62  . 
Clearly  (2.8),  (2.11),  and  (2.14)  imply  that  for  every  6  >  0 

(2.15)  dist((xT,7T),{(r,2p-r):0<r<2p})  >6a_l5loga} 

„  .  -k  -a. 

■  o(a  e  ) 

for  every  k  >  0,  since  >  a. 
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I 


3 


♦ 


Similarly  it  may  be  shown  that  if  T  £  a£^\ 

—  —  -1/2 
dist((xT,yT) ,  N)  >  a  log  a,  but 

dist((xT,yT) ,  {(r,2p-r),  0£r£2p})  <  6a-1^2  log  a 

for  some  (sufficiently  small)  5  >  0,  then 

(2.16)  AT  -  a  2  c**  lo8  a  ; 


this  together  with  (2.8)  and  (2.11)  imply 

(2.17)  P  (T^ael1;  dist((x_,yT) ,  N)  >  a'1/2  log  a; 

dist((xT,yT),  { (r,2p  -  r)  ,  0<r^2p})  <6a 

,  -k  -a. 

=  o(a  e  ) 


-1/2 


log  a) 


for  all  k  >  0.  This  and  (2.15)  imply  (2.7).  Ill 

For  the  next  step  of  the  proof  we  will  again  exploit  the 
fundamental  identity  of  sequential  analysis,  but  with  a  new  prob¬ 
ability  measure,  which  we  will  again  refer  to  as  Q.  Let 


(2.18) 


Q(B) 


f2p 

(B)dr/(2p)  ; 

'r=0  ’  p 


then 


(2.19)  dP(n)/dQ^n)  =  L 

p,p  r 


(■2p  nx  n-nx  ny 

n,,  >  n,_  x  n 

r  (1  -  r)  (2p  -  r) 

r*0 


n-ny 


-i-l 


(1  +r  -  2p)  n  dr/(2p) 


(where  and  denote  the  restrictions  of  P  and  Q  to 

P,P  P.P 


3  ((X1,Y1),...,(Xn,Yn))) 


PROPOSITION  2.  Suppose  that  for  some  (r,2p-r)  e  N, 


(2.20) 


Then  as  n  +  ® 


dist((xn,yn) ,  (r,2p-r))£n 


1/2  +  1/7 


(2.21)  Ln  -  e  n(J)5i  (2P)((2r(l -r))_1 +(2(2P-r)(l+r -2P))  X)1/2 


IT  - 

x  -  r  x  -  r 

,n,  n  „  n 

(?)  _  M  _ 

yn+r-2p  yn  +  r-2p 


where 


(2.22) 


(r (1  -  r) )  1  ((2p -r)(l +r -2p))  1 


(r(l-r))"1  ((2p-r)(l  +  r-2p))'1 


1  1 


-  (p(l  -  p))' 


1  1 


Relation  (2.21)  holds  uniformly  for  (x  ,y  )  satisfying  (2.20)  with 

n  n 

(r,2p  -  r)  e  N. 

The  proof  of  this  is  omitted:  it  is  a  straightforward  but 
tedious  exercise  in  the  use  of  Laplace's  method  of  asymptotic 
expansion. 

The  strategy  for  the  rest  of  the  proof  is  to  show  that  for 
each  (r,2p-r)  eN,  (AT~a)  and 
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/r 


XT  ~r 


yT  +  r  -  2p 


are  approximately  Independent  under  E  .  as  a  For  then  the 

r,zp— r 

Central  Limit  Theorem,  the  Nonlinear  Renewal  Theorem  of  Lai  and 
Siegmund  [9],  and  the  fact  that 


(2.23) 


*£  Pr.2P~r  ,  _ _ 1 . 

a  I(r ,2p  -  r) 


“1/2  “3 

will  make  possible  the  evaluation  of  E  ,  a  e  L_  1.,  where 

r,zp-r  i  A 


(2.24) 


A  =  {T  ^aC^1;  dist((xT,yT) ,  N)  <_  a-1^2  log  a}  . 


Nearly  all  of  the  technical  difficulties  associated  with  this 
program  are  obviated  by  the  following  inequalities. 


LEMMA  1.  Let  have  a  binomial  distribution  BI(n,p)  under 
P  ,  p  e [0,1].  Then  for  each  k  >  0,  6  >  0,  and  a  >  0 


(2.25)  max  P  { J S  -np|  >60^  log  n)  ■  o(n_k) 

0<p<l  p  n 


and 


(2.26)  max  P  {|S  -np|  >6n5s+a)  -  o(e  ) 

0<p<l  P  n 


PROOF.  Using  the  Markov  inequality. 
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P  (s  -np  <  n1/2  f  (n) }  <  E  exp{-8n"1/2(S  -np))/e  Bf(n) 
p  n  —  p  n 


8pn^2  ..  -8n  ^2.n.  -8f(n) 

e  (1  -  p  +pe  )  /e 


exp{gpn1/2  +n(-8pn~1/2  +0(^))}/e  6t(n)  *  0(l)/e~ef(n) 


for  8  >  0,  f(n)  <  0.  It  is  clear  from  the  Taylor  series  expansion 
that  the  0(1)  term  is  uniform  in  p.  The  reverse  inequality  may  be 
obtained  similarly.  /// 


COROLLARY  1.  As  a  -+  « 


(2.27) 

max 

P  , 

(r ,2p-r)eN 

r,2p-r 

(2.28) 

max 

P_  _ 

(r,2p-r)cN 

r,2p-r 

n/4 

i Ta  4  [n2,n3  ])  =  o(e  ) 
{|,'n~rn1l  "C/loR  a’ 


1/16 

some  nf  ln7,n^])  =  o(e  ) 


(2.29)  max  P 


-  {|x-x  |  +  |y  -  y  |>Ca^2  log  a) 

(r ,2p-r)  r«2P~r  n  nl  '  n  V 


1/32 

-  o(e“3  )  ; 


here 

(2.30) 

n^  -  n1(a,r)  -  £  a/I(r,2p  -  r)  -  a^  +  n  1 

(2.31) 

n2  =*  n2(a,r)  =*  [a/I(r,2p  -  r)  -  a'5+n^ 

(2.32) 

n3  -  n3(a,r)  -  I  a/l(r,  2p  -  r)  +  a*5  +  n^2 

12 

-fji 


' 

T 

(2.34) 

x  -  r 

x  -  r 

C  *  5  (r)  -  n 

n 

M 

n 

n  n 

7n+r-2p 

1  J 

r 

y  +  r  -  2p 
n 

and  ne  (0,1/32)  is  some  fixed  constant.  Relations  (2.28)  and  (2.29) 
hold  for  all  C  >  0. 

The  corollary  is  an  easy  consequence  of  the  preceding  lemma. 
Def ine 

(2.35)  A  -  A  fl  (T  e  [n-.n,]} 

r  a  l  i 

n  {  UT(r)  -  t  (r)  I  1/log  a) 

1 

n  HXT +  |yT- y^l  1  a_1/2  log  a} 

n(|x,  - r |  +  ly  +r-2,|  ; 

"l  1 

by  Proposition  2 

(2.36)  1A  a~1/2  ea  L?  -  0(eC  log  a) 
for  some  C  >  0,  so  the  Corollary  Implies  that 

(2.37)  a*1'2  e*  Ppp  (A)  -  Et2p_rdA  «* 

*  [2P  Er,2p-r(1At  *'1/2  Vdt'2"  +o(1>  • 

Now  by  Proposition  2  and  (2.35), 
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(2.38)  1  a“1/2  ea  LT  ~  e  (  T  3  (i  I(r,2p  -  r)) 1/2  (2p) 

r 

•  (2r(l  -  r)-1  +  (2(2P -r)(l+r-2P))~1)1/2 

•  exp{^n  (r)/2}lA 

1  r 

and  this  hold  uniformly  for  reN  on  A^. 

The  "asymptotic  independence"  argument  is  completed  by  the 
following  result. 


PROPOSITION  3.  For  each  reN  such  that  the  random  variable 


!(r.2p  -  r)  +  +  (K1+t-2p)3I/3p2|<ri2p.r) 


has  a  nonlattice  distribution  when 


r,2p-r* 


-(A_-a) 

(2.39)  E  [e  1 3 

r,2p-r  n. 


-  v(r  ,2p  -  r)  -r^2P-~r  v  o  . 


This  result  is  implicit  in  the  proof  of  the  Nonlinear  Renewal 
Theorem  given  by  Lai  and  Siegmund  [9]. 

It  is  relatively  easy  to  deduce  Theorem  1  from  (2 . 37) -(2 . 39) . 
Uniform  integrability  problems  may  be  handled  by  using  (2.36),  the 
Lemma,  the  Corollary,  and  the  Berry-Esseen  Theorem  (for  random 
vectors) .  The  details  of  these  arguments  are  straightforward  but 
tedious,  and  will  be  omitted:  they  would,  perhaps,  serve  only  to 
obscure  the  basic  argument.  The  bloodthirsty  reader  should  rest 
assured  that  his  appetite  for  raw,  gory  arguments  (and  detailed 
obscurity)  will  almost  certainly  be  satisfied  by  the  end  of  this 
work. 


14 


3.  Preliminaries  Concerning  Exponential  Families 
and  Statement  of  the  Main  Result 

Let  X^,X2>.*‘  be  an  l.i.d.  sequence  of  random  vectors  each 

with  law  Pg.  We  will  not  distinguish  between  Pg,  the  measure  on  IR**, 

and  Pg,  the  measure  on  the  a-algebra  SCX^X^,...  ).  Recall  that  if 


X  ~  Pg,  then 


(3.1) 


Ee  x  •  ve  *<e)  ■  ve 


CO»e  X  .  1(.(8)  .  <(6)  ; 

since  we  have  assumed  ip  to  be  strictly  convex,  $(6)  is  perforce 
positive  definite  for  each  9  eQ.  Assume  that  pg  -  0. 

Let  T  -  (Mg',  dc^}  be  the  mean  parameter  space.  Because 
$(9)  *  vi  <f<(9)  is  strictly  positive  definite,  the  map 


(3.2) 


:  Q  -►  T  by 


0  -*■  Wo 


is  a  diffeomorphlsm  (this  is  the  Inverse  Function  Theorem  of 
Calculus) .  Thus  although  f  need  not  be  convex.  It  is  an  open  subset 


of  nr. 


(3.3) 


The  (nonnegative)  function 


$(x)  ■  sup  (9  x  - 1^(9) ) 
9e£) 


is  the  "convex  dual"  of  ip.  For  xcT  the  supremum  in  (3.3)  is 
uniquely  attained  at  that  9  for  which  Pg  -  x;  henceforth  this  6  will 

A 

be  referred  to  as  9(x).  It  is  avident  that  for  xeT 
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(3.4) 


?x  4»(x)  -  0(x) 

4>(x)  -  $(0(x))_1  . 

Moreover,  since  Oeft  (by  (1.1)),  the  set 

(3.5)  -  (x  elRP  :  i{)(x)  <  b} 

Is  compact  for  each  b  >  0. 

Recall  that  ft^  is  a  smooth,  relatively  closed  q^-dimensional 
submanifold  of  ft  and  ftg  is  a  smooth  relatively  closed  q0~dimensional 
submanifold  of  ft^.  Since  0  -*•  Mg  is  a  dif feomorphism,  T  *  (jig  :  0eft^} 
are  smooth  relatively  closed  submanifolds  of  T.  (NOTE:  A  convenient 
and  elementary  source  of  Information  concerning  the  topological  and 
geometric  concepts  used  here  is  Guillemin  and  Pollack  [8].)  Define 
convex  functions  and  4^  by 

(3.6)  $.(x)  -  sup  (0Tx-  «M0))  ; 

1  0cft1 

the  log  generalized  likelihood  ratio  statistic  is  then 

(3.7)  -  n(4>1(Sn/n)  -  ^(S^n))  . 

It  is  apparent  that  the  behavior  of  the  functions  4>q  and  will  play 
a  crucial  role  in  all  that  follows. 

Unfortunately,  for  a  given  x  e  IRP  the  supremum  in  (3.6)  need 
not  be  uniquely  attained.  For  fi^cft^,  a  necessary  condition  for 

(3.8)  x  -  i|<(0  )  -  sup  («Tx  -  HO)) 

u  u  0eft 
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,i|  I1JUU  IWIIJ  !J| 


(3.9)  x  ' 

where  Tfl^Og)  is  the  space  of  tangent  vectors  to  in  1RP. 
(Throughout  the  paper  we  will  use  the  notation  TN(y)  to  denote  the 
space  of  tangent  vectors  to  N  at  y:  i.e.,  if  N  is  a  q-dimensional 
submanifold  of  1RP,  then  for  y  e  N 


(3.10)  TN(y)  =  (v  £  IRP  :  3  smooth  g  :  [-1,1]  ■+  N 

with  g(0)  »  y  and  g'(0)  *  v) 

For  each  yeN  TN(y)  is  a  q-dimensional  vector  subspace  of  1RP.)  Let 

(3.11)  Uj  =>  {xeT  :  the  supremum  in  (3.6)  is  attained  uniquely 

A 

at  some  point  8^(x) 


J. 

LEMMA  1.  For  each  9  eft,  the  affine  space  pQ  +  Tft . (0)  intersects  T 

1  0  1 

transversally  at  Mg.  Furthermore,  for  each  0  e  there  is  a 
neighborhood  N(Mq)  of  Mg (open  in  D  such  that  N(Mq)  c  and  such 
that 

Q±  :  N(v0)  ^ 


is  a  smooth  submersion.  If  x  z  T  has  an  open  neighborhood  c 

A 

such  that  0.  :  N  +  SI.  is  a  smooth  submersion,  then 
i  x  i 


(3.12)  Vx  ^(x)  -  ^(x)  , 


vT  <t>1(x) 


v  v  e,m1(ei(x))  , 


(3.13) 


v  >  0 


and 


(3.14)  ojT  7^  <J>  (x)oj  *  0  V  0)  £T(2i(ei(x))"L  . 

NOTE:  Let  and  N^  be  smooth  submanifolds  of  IRP,  and  let 
y  E fl  Nj.  Then  and  are  said  to  intersect  transversally  at  y 
if  the  tangent  spaces  together  span  ©p,  i.e.,  if 

(3.15)  TNL(y)  +  TN2(y)  =  1RP  . 

A  map  g  :  -►  N2  is  said  to  be  submersive  at  x  e  if  for  every 

veTN2(g(x))  there  exists  a  smooth  f  :  [-1,1]  -*■  N  with  f(0)  =  x  such 
that  (gof)'(O)  =  v;  i.e.,  if  dgx  maps  TN^(x)  onto  TN2  (g(x)). 

1 

PROOF  OF  THE  LEMMA.  Since  dim(yQ  +TO  (6)  )  +  dim(Tr\(y0))  =  p,  the 

_L 

transversality  of  y0  +  Tft^(0)  and  1*^  at  y0  will  follow  from  showing 

(3.16)  T n±(0)  n  TTi(p0)  =  {0}  . 

For  (3.16),  suppose  g  :  [-1,1]  -*■  is  a  smooth  map  such  that 

g(0)  =  yQ  ;  then  since  Tip  :  T  is  a  dif  feomorphism,  there  is  a 

0 

smooth  f  :  [-1,1]  such  that  f(0)  =  00  and  g(t)  =  Tgip(f(t)).  Now 

g'(0)  =  7^  <Kf(0))  •  f'(0) 

-  t(8Q)  •  f ' (0)  . 

Since  f'(0)  eTfi^(0Q)  and  $(Qq)  Is  positive  definite,  it  is  impossible 
for  g'(0)  _l  Tfi^(00) .  This  proves  (3.16). 
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Fix  9QeQ there  exists  a  neighborhood  N(y0  )  in  F  such  that 
u  l  °o 


if  x  e  N(y„  ) ,  then 
90 


(3.17) 


suptce1  x-iji(9))  :0eft.  and  yfi^N(y.  )} 

i  H  60 


<  sup((6T  x-iK9))  :  0  eSI  and  yfi  £N(y0  )} 

i  9  90 


and  such  that  the  supremum  on  LHS  (3.17)  is  not  attained.  Now  N(y.  ) 

X  0 

may  be  chosen  small  enough  that  the  affine  spaces  yg  +  Tfii(0)  give  a 

"smooth  fibration"  of  N(yQ  ):  i.e.,  for  each  xeN(yn  )  there  is  a 

90  ±  60 

unique  9  for  which  x£yg  +  Tft^(9x)  ,  and  such  that  the  map  x  ■+•  0 

x 

is  smooth  and  submersive.  (This  fact  relies  on  the  fact  that  the 

spaces  y^  +  1^(0)  intersect  transversally  at  yg,  together  with 

the  fact  that  the  map  9  -*■  Tft^(0)  is  a  smooth  mapping  into  the  set  of 

q^-dimensional  vector  subspaces  of  JR**.)  But  the  necessary  condition 

(3.9)  together  with  (3.16)  implies  that  for  xeN(y.  ),  9  =  0  (x) . 

H0  x  1 

Next,  suppose  that  x  eT  has  an  open  neighborhood  c  such 

A 

that  0.  :  N  +  Q.  is  a  smooth  submersion.  Then 
i  x  i 


O/aXjH^x)  -  O/9xj)(0i(x)x-<j;(0i(x))) 


(0i(x))j  +  ^i[(3/9xj)(0i(x))k]  ^ 

-  MO/9xj)(0i(x))k]  •  (3^/30k)(0i(x))  -  (0i(x))j 


since  by  (3.9)  x  -  7giK0i(x))  j_  W^(9i(x)).  It  now  follows  from 
(3.12)  that 
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(3.18) 


K  Vx> 


(O/axjXe^x)),)^^^ 


A  A 

since  x  -*•  Q^Cx)  is  submersive  this  matrix  includes  Tft^(0  (x))  in  its 

/.  _L 

range,  and  clearly  Tfl^(8^(x))  is  contained  in  the  kernel.  Thus 
2  A 

^x  ^i^x^  is  invertible  on  Tf2^(0^(x)).  On  the  other  hand  it  is  non- 
negative  definite  on  IRP  since  4^  is  convex.  Consequently  it  must 

A 

be  strictly  positive  definite  on  Tft^(8^(x)).  It  may  also  be  shown 
using  (3.18)  that  4>±(x)  h  Tft^e^x))  =  0.  Ill 

Lemma  1  gives  a  partial  indication  of  the  importance  of  two 
topological  regularity  properties:  namely,  transversality  conditions 
and  the  submersiveness  of  the  MLE  maps .  Another  reason  the  trans¬ 
versality  conditions  figure  in  the  analysis  stems  from  the  following 
purely  topological  fact,  which  will  be  exploited  in  Section  6. 


LEMMA  2.  Suppose  V  is  an  open  subset  of  1RP,  and  N^,N2  are  rela¬ 
tively  closed  submanifolds  of  V.  Let  K  be  a  compact  subset  of  V  such 
that  if  x  e  fl  N2  flK,  then  and  N2  meet  transversally  at  x.  Then 
for  any  €  >  0  there  exist  6  >  0,  6  >0,  and  a^  >  0  such  that  for  any 
a  _>  a^  and  any  y  e  V 

(3.19)  dist(y,  ^  (1  Nj  fl  K)  >  6/a 

dist(y,  H  K)  <  6/a 

implies 

(3.20)  dist(y,  N2)  >61 a 


In  other  words  a  point  cannot  be  far  from  the  intersection 
without  being  far  from  one  or  the  other  of  the  two  manifolds.  This 
is  manifestly  untrue  of  manifolds  which  intersect  nontransversally : 
e.g., 

*  {(x,y)  £  3R.2  :  y  =  x^} 

N2  *  ((x,y)  e  1R2  :  y  =  0} 

PROOF.  We  will  give  only  a  rough  outline  of  the  argument.  Suppose 

first  that  and  N2  are  affine  subspaces  of  1RP  :  the  existence  of 
* 

o  and  o  follows  from  the  construction  of  disjoint  angular  corridors 
around  N..  and  N.  as  illustrated  by  Figure  3.1 


In  the  general  case,  and  may  be  approximated  to  first 
order  by  the  appropriate  translates  of  their  tangent  spaces;  if 
and  intersect  transversally  at  x,  then  the  tangent  spaces  TN^(x) 
and  TN2(x)  intersect  transversally.  Angular  corridors  may  then  be 
constructed  as  before.  Thus  for  each  x  e  f)  N2  OK  there  is  a 
closed  neighborhood  in  and  constants  5^,  5^  ,  ag(x)  such  that 
for  a  ag(x)  and  y  eV 

dist(y,  Nx  fl  N2  fl  K  f!  U  )  >  £ja 

and 

dist(y,  H  n  K  fl  U  )  <  6  /a 
■L  XX 

imply 

dlst(y,  N2)  >  6*/a  . 

The  lemma  now  follows  from  a  compactness  argument  (since 
and  N2  are  relatively  closed  in  V  and  K  is  compact,  fl  N2  (IK 
is  compact).  /// 

The  conclusion  of  the  main  theorem  depends  heavily  on  the 
assumption  that  the  MLE  maps  behave  nicely  near  a  certain  critical 
manifold,  and  also  that  the  manifold  not  contort  itself  too 
strenuously  in  certain  regions  of  T.  Let  0Q  and  define  €g(9g) 

to  be  the  largest  extended  real  such  that  the  following  three  condi¬ 
tions  are  satisfied: 
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I. 


For  every  €,  0  <  €  <  €q,  the  set  (x  e  T  :  <j>(x)  -  8q  x  +  iK^q)  £  G) 
is  compact. 

-L 

II.  For  each  x  e  H  (pg  +Tf2g(9g)  )  such  that  <J>^(x)  —  <J>q Cx)  < 
there  exists  a  neighborhood  N^  of  x  in  F  such  that  c 

A  A 

and  0„  is  a  smooth  submersion  on  N  ,  with  0„(x)  *  9_. 

0  x  0  0 

_L 

III.  For  each  xeT^  fl(Pg  +  (0Q>  )  such  that  <j>^(x)  —  <t>Q(x)  <  Gq. 

0  -L  L 

rL  intersects  (Pg  +  Tftg(0g)  )  transversally  at  x,  and 

0  T  T 

intersects  the  level  surface  (y  e  T  :  4>(y)  -  0Q  y  -  $(x)  -  0Qx} 

transversally  at  x. 

T 

Note  that  there  is  always  a  positive  €  such  that  {x  e  F  :  <j>(x)  ~  x 
+  '('(Qq)  £G)  is  compact,  since  (x£lRp  :  4>(x)  -  0qX+iK6q)  £  G)  is  com¬ 
pact  (cf.  (3.5),  and  reparametrize  the  exponential  family).  That 
<^(0g)  >0  may  be  deduced  from  this  and  Lemma  1.  It  should  be  noticed 
that  in  the  special  case  ft  *  ft^  condition  III  is  automatically 
satisfied,  and  in  case  Y  -  IB?,  condition  I  is  automatically 
satisfied. 


THEOREM 


2.  Suppose  0  <  <  G2  <  Gq(0q) ;  recall  that 


(3.21) 


Ta  =  T  *  min(n  _>  afc^1  :  ^  >  a) 


Then  as  a  -*■  00 


-1  <qrq0>/2  _a 

(3.22)  P0  (Ta<a^  J  -  a  -e  •C(€1,^;  6Q) 


where 


(3.23)  c(e1,e2;  e0) 


-Cql"qn)/2 

v(y)(2ir(^(y)  -4>0(y))) 

^(ei’W 

■  [det(H2(y))/det(t(0(y))(H1(y)  +H3(y»)  ]1/2 


•  o(dy) 


(3.24)  M(€1,€2;e0)  -  (ye  n(vieo+®o(0o)i')  :  ^i-Ky)  -<J>Q(y)  <^) 


-(AT-a) 

(3.25)  v(y)  =  lim  Eg.  .  e 

a-+°° 


(3.26)  H1(y)  =  ^(e(y))"1  PH2(y)"1P$(0(y))“1 


(3.27)  H2(y)  -  P  $(0(y))  1PK®0(eQ)1  0  TT^y)) 


(3.28)  H3(y)  -  t(0( y))"1-^  4>1(y)  <f>Q(y) 


P  is  the  orthogonal  projection  operator  onto  the  space 
± 

Tfio(®o^  A  Tr^(y) ,  and  a  is  the  volume  element  measure  for  the 

manifold-with-boundary  M(£^,^;  0^) . 

Many  comments  are  in  order.  First,  conditions  I  and  III 

(transversality)  imply  that  8q)  reaHy  is  a  compact 

manifold-with-boundary:  this  is  a  consequence  of  the  Implicit 

Function  Theorem.  Second,  the  "det  ^(y)"  which  appears  in  the 

numerator  may  be  confusing:  ^(y)  is  a  (positive  definite)  operator 
_L 

on  TI^(y)  H  mo(0o)  ,  and  the  determinant  is  simply  meant  to  be  the 

_L 

product  of  its  eigenvalues  on  tr^y)  fl  lfto(0o)  .  Third,  it  remains 


to  be  seen  that  the  Integral  is  finite,  and  in  fact  that  the  inte¬ 
grand  is  defined  (cf.  (3.25)).  Notice  that  <f>  -  is  a  continuous 
function  which  is  bounded  away  from  zero  on  M(€^,^;  0^).  The  other 
two  factors  of  importance  require  more  care. 


LEMMA  3.  For  every  ycM(CL,^;  0Q)  the  matrix  H^(y)  +  H^Cy)  is 
strictly  positive  definite  on  IRP. 


PROOF 


.  Since  $(0)  is  everywhere  strictly  P.D.  it  is  clear  that  H^(y) 


_L 


is  N.N.D.  on  Be  and  strictly  P.D.  on  T^q(0q)  D  Tl^y).  Also  if 

y£M(61,€2;  0Q) ,  then  since  ^  <  €q(8q)  and  ^q(6q)  satisfies 

condition  II,  it  follows  from  Lemma  1  that  tf^Cy)  is  N.N.D.  on  IRP 

and  strictly  P.D.  on  1^(0^. 

2 

Now  consider  -  <^)  (y) .  Since  |  ^  is  a  nonnegative 

O  2  D 

function  on  1RP  which  is  zero  on  Vy(<|>  -  4>^)  (y)  is  N.N.D.  on  1RP 

whenever  yeT^,  by  Taylor's  Theorem.  Furthermore,  if  yeT^,  then 

<f>^(y)  is  zero  on  the  vector  subspace  lfi^(0(y))  (cf.  (3.14)  of 

Lemma  1).  Thus  7^(<j>  -  4>-^)  (y)  is  strictly  P.D.  on  Tft^(0(y))  for  each 

A  J_ 

yeT^.  Now  since  (y  +  T^^(0(y))  )  intersects  transversally  at  y 
(Lemma  1  again)  it  follows  that  V^(4>  -  $^)  (y)  is  strictly  P.D.  on 
TT^y). 


But  TT^y)1  +  TT2O(0O)  +(^(0^  D  Tri(y))  -  IRP,  so 
H^(y)  +H,j(y)  is  strictly  P.D.  on  IRP.  Ill 

As  for  v(y),  the  existence  of  the  limit  in  (3.25)  is  a  conse¬ 
quence  of  a  general  theorem  of  Lai  and  Siegmund  [9].  In  order  that 
their  theorem  be  applicable,  however,  a  certain  random  walk 
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associated  with  the  process  (A  }  must  be  nonlattice.  In  the  next 
lemma  it  is  shown  that  this  is  so  far  almost  every  y  0q)  . 

LEMMA  4.  For  all  ^  ^  <  €O(0Q) , 

(3.29)  afyeMC^,^;  eQ)  :  (^(y)  -  *0(y) )  +  ^  -  y)T  Vy(4>x  -  <!>0)  (y) 

has  a  nonlattice  distribution  when  X,  ~  P'  .}  =  0  , 

1  «(y) 

where  a(*)  denotes  the  volume  element  measure  on  ®q) • 

PROOF.  Call  a  point  cu  e  IRP  a  support  point  of  the  exponential  family 
if  for  some  0  eft  Pg{u  •  lu_CJi  <  €)  >  0  for  each  £>0.  Clearly  if  co  is 
a  support  point,  then  Pg*{u  :  |u-(oj  <  £)  >  0  for  every  0*  e  ft  and  £  >  0. 
Because  the  covariance  matrices  J(0)  are  strictly  P.D.,  there  exist 
support  points  ^,...,(0  which  form  a  (vector  space)  basis  for  IRP. 


Ky)  *  4|1(y)  -  <?0(y)  .  y  e 


A  necessary  condition  for  I(y)  +  (X^-y)  \7y  X ( y )  to  have  a  lattice 

distribution  (under  Pg^)  is  that  for  any  pair  of  the  atoms 

there  exist  a  rational  number  qr.  such  ttiat  either 

ti.JJ 

(3.30)  ^  (I(y)  +  (wi  -  y)T  Vy  I(y))  =  I(y)  +  (<0j  -  y)T  Vy  I(y) 


(3.31)  j}(Ky)  +  -  y)T  ?y  Ky))  =  Ky)  +  -  y)T  ?y  Ky) 


T 


Suppose  that  there  is  a  y  -  y()£M^S.’S;  8uch  that  conditions 
(3.30)-(3. 31)  are  satisfied  for  y  ■  Yq  and  some  particular  set 
(q^  of  (^)  rational  numbers:  we  will  show  that  there  is  a 
neighborhood  N(yQ)  of  yQ  such  that  if  yeN(yQ)  ni^fUUg  +TDO(0O)  ), 
then  (3.30)-(3.31)  are  not  satisfied  for  y  and  the  same  set  of 
rationale  (q^  j}^*  the  countakmty  of  the  rationals  this  will 
prove  (27)  . 

Suppose  the  indices  are  labelled  so  that  alternative  (3.30) 
holds.  Consider  the  first  order  Taylor  series  of 


(3.32)  f{lj}(y)  =  I(y)(i-qa>j}) 

+  (^-q{itj}  y(l-q{i>j}))T  Vl(y) 

around  y  m  y^: 


(3.33)  T£{l>jJ(,)  -  (»t  -  qjj..,}  “j  ‘  V1  '  »f  3,  j)» 


•  *y  Ky0)  •  (y-y0)  • 

Since  ^  1  and  the  transversal  it  y 

condition  111  imply  that  I(y^)  (T£3q(0q)  H  TF^(yg) )  is  strictly 

P.D.  Consequently,  because  is  a  basis  for  IB?  ,  it  follows 

-L  P 

that  for  each  ue  (TOq(0q)  fl  Tr^(yg))  satisfying  |u[  ■  1,  there  is  a 
pair  {i,j}  such  that 


(3.34) 


Tf(i,j}(y0  +  tu)  *  0 


{i.j}’  and 


whenever  t  ^  0.  Since  Tf 


is  the  principal  term  in  f 


a 


since  B  ■  (u  e (Tfig(0g)  f)  Tr^(yg))  :  |u|  =  l}  is  compact,  there  is 
S  >  0  such  that 


f(i,j} (y>  *  ° 
X 


for  any  ye^  fl  (Mg  +Tfig(0g)  )  satisfying  0  <  | y  —  yQ |  <  6.  This 
proves  the  lemma.  /// 

Although  this  lemma  together  with  the  result  of  Lai  and 
Siegmund  shows  that  the  function  v(y)  is  well-defined  for  almost 
every  y  (da),  there  is  as  yet  no  hope  of  evaluating  it.  However,  an 
important  result  of  M.  Woodroofe  has  v(y)  expressed  as  an  integral 
involving  only  the  characteristic  function  of  the  random  variable 
considered  in  Lemma  4.  Woodroofe' s  theorem  makes  possible  the  evalu¬ 
ation  of  the  constant  appearing  in  Theorem  2  by  numerical  integration 
in  many  cases  of  statistical  interest:  his  paper  [ 16 ]  contains  not 
only  a  proof  of  the  theorem  but  several  interesting  examples  of  its 
use.  (NOTE:  Actually  Woodroofe' s  theorem  carries  certain  hypotheses 
concerning  the  smoothness  of  the  underlying  distributions  which  are 
unnecessary,  as  an  elementary  modification  of  his  proof  shows). 

Theorem  2  generalizes  another  theorem  of  Woodroofe  (Theorem  3 
of  [15])  which  essentially  covers  the  case  Q=  but  under  smooth¬ 
ness  conditions  on  the  distributions  (Pq)  which  rule  out  all  problems 
involving  categorical  data.  His  proof  seems  to  be  very  much  tied  to 
these  assumptions,  and  bears  no  resemblance  to  the  approach  used  in 


this  work. 


For  the  proof  of  Theorem  2  we  will  assume  that  0  e  and 
8q  ■  0.  For  arbitrary  we  may  always  reduce  to  this  case  by 

reparametrizing  and  recentering  the  expoential  family. 
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4.  Normal  Approximation  in  Several  Dimensions 


Certain  refinements  of  the  multidimensional  Central  Limit 

Theorem  play  a  key  role  in  the  analysis  which  follows.  Of  primary 

importance  are  (1)  bounds  on  the  probability  of  moderate  deviations 

of  the  sample  mean,  and  (2)  uniformity  in  the  convergence  of 
-1/2 

n  (S  -niO  (under  P^)  over  compact  subsets  of  the  natural  param- 

no  o 

eter  space. 

Let  {  be  a  symmetric  positive  definite  matrix  on  IRP ,  and  let 
be  the  Gaussian  measure  on  IRP  with  mean  zero  and  covariance 
i.e. , 

(4.1)  Qj(A)  =  (det  $)~1/2  (2tt)_p/2  expi-yT  f1  y/2jdy 

for  Borel  sets  A.  In  addition,  let  Ci  be  the  class  of  p-dimensional 

n  r 

half-spaces 

(4.2)  A^(a)  *  c  >  at 


where 


and 


1*1  “  1 


0  <_  a  £  n^^/log  n 


PROPOSITION  1.  Let  K  be  any  compact  subset  of  0  (the  natural  parame¬ 
ter  space  of  the  exponential  family  (p q ) »  which  is  assumed  to  be  an 
open  set  of  IRP).  Then 
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(4.3) 


(A)  =  1 


—1/2 

lim  sup  sup  PQ(n  (S  -nya)  eA}/Qt.  . 

n-*=°  0eK  AeG  9  n  0  +  (9) 

n 

—1/2 

(4.4)  lim  inf  inf  Pfi{n  (S  -nyfl)  eA}/Qj..fi.  (A)  =  1 

n~°  0eK  AeG  6  n  0  +  (9) 

n 

The  proof  of  this  is  a  rather  tedious  modification  of  the 

proof  of  Cramer’s  Theorem  (cf.  Feller  [7],  Chapter  XVI,  Section  7). 

The  only  real  novelty  is  the  uniformity  in  0.  However,  the  third 
X  3 

moments  {E0|£  (X,  -y_)|  ;  |  i.  I  ■  l)  are  uniformly  bounded  away  from  °°, 

0  1  u 

X  2 

and  the  second  moments  (Eg|£  (x^-^q^I  *  I®’!  *  l)  are  uniformly 

bounded  away  from  zero,  for  0  in  any  compact  subset  of  I)  (recall  that 

the  covariance  matrices  $(0)  were  assumed  to  be  positive  definite  on 

ft) .  Thus  the  Berry-Esseen  Theorem  provides  a  bound  for  the  error  in 

-1/2  T 

the  normal  approximation  to  the  Pg-distribution  of  n  (Z  (S^  -ny^)) 
which  is  uniform  in  0  and  Z.  Moreover,  the  errors  in  the  Taylor 
series  expansions  used  in  the  proof  are  all  uniformly  small,  again  by 
the  compactness  of  K. 

COROLLARY  1.  Let  K  be  a  compact  subset  of  ft.  Then  for  all  6  >  0  and 
k  >  0 

(4.5)  max  PQ{|S  -nyQ|  >  6n*^  log  n)  -  o(n  k) 

0tK  6  n  9 

and  for  all  €  such  that  0  <  6  <  1/6 
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(4.6) 


max  P  {|S  -nuJ  >6-nis  +  €}  =  0(exp{-62  n3€/2})  , 

0eK  °  D 

as  n  +  °°. 

The  proof  is  straightforward. 

COROLLARY  2.  Let  K  be  a  compact  subset  of  ft,  and  suppose  0  -*•  A(6)  is 
a  continuous  function  of  0eK  with  values  in  the  group  of  symmetric 
positive  definite  pxp  matrices.  Then 

(4.7)  E0  lB(n  0)  exp{(Sn-nye)T(|(0)_1- A(G))(Sn  -npG)/2n) 

+  (det  A(0)det  |(0))"1/2 
as  n  +  ”;  B(n,9)  is  the  event 

(4.8)  B(n,6)  =  { j  S  -np„j  <n^2  a  )  , 

n  o  —  n 

and  {a^}  is  any  sequence  of  constants  such  that  an  ♦  00  and 

a^  «  0(n^*Vlog  n) .  Furthermore,  the  convergence  in  (4.7)  is  uniform 

for  9eK,  for  each  sequence  {a^} . 

PROOF.  This  is  accomplished  in  two  stages,  using  Theorem  1  to 
establish  the  uniform  integrability  of  the  random  variables,  and 
Bhattacharya ' s  multidimensional  extension  of  the  Berry-Esseen  Theorem 
for  the  integration. 

BHATTACHY ARY A ' S  THEOREM  (cf.  Bhattacharya  and  Rao  (3],  Corollary 
15.2).  Suppose  Xj.X^,...  are  i.i.d.  random  vectors  in  IR*5  with  mean 
zero,  covariance  I,  and  finite  absolute  third  moment  “  E]x^|  . 
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w 


Then  for  each  bounded  Borel  measurable  function  f  :  IR^-^IR 


|Ef(S  /n1/2)  -  f  f(y)e"ly|2/2  dy/(2Tr)p/2 [ 
n 


<  c,  M(f)P3  n"1/2  +  2o(f ;c2o3n'1/2) 


where 


M(f )  =  sup  jf(x)  -f(y) 
x,yef? 


w(f;0  =  sup 

ucfP 


^-!y|2/2/(27T)P/2J 

ye  if 

•  supllf^  +u)  -f(x2  +u)  |  :  |xx  - y |  v  |x2  -y|  <  €>dy 


S  «  +  .  . .  +  , 

n  T.  n 


and  c^,c2  are  universal  constants  (which  may  depend  on  the  dimension 


P)  • 


The  idea  is  to  apply  this  result  to  the  functions 

f0  b(y)  =  gQ(y)  i^80(y) 


where 


gQ(y)  *  exp 


{yT(I  -$<e)1/2  A(9)$(9)1/2)y/2} 


It  is  clear  that  for  fixed  b, 


sup  M(f .  )  <  00 
9eK  9>b 


and 
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I  '  V 


S. i;S. 


and  that 


lim  sup  0)(fo  ,  ;  0  =  0  , 

6*0  9eK  6’b 

lim  /  fa  ,  (y)e"lyl  /2  dy/(2ir)p/2  =  (det  A(0)|(0))~1/2 
bf~  0’b 

uniformly  for  0eK.  Thus  to  prove  (4.7),  it  suffices  to  show  that  for 
any  €  >  0  there  is  a  b  so  large  that 

(4.9)  sup  E„  1B(  0)  l(ee(t«)-1/2(Sn-We)„-1/2)>b) 

0eK 

•  gea<0r1/2(sn-mj0)n"1/2)  <  6  . 

Let 

6  =  1  A  inf(xT  A(9)x/xT  $(0)  1  x  :  9  cK,  |xj  =  l)  : 

since  K  is  compact,  6  >  0.  Then  for  each  0  e K  there  is  a  polyhedron 
R(0)  such  that  for  every  y  eR(0) 

yT(t (9) _1  -  A(9))y  <2 

and 

yT  $(0)_1  y  >  2(l-6/2)-1  . 

This  follows  from  the  definition  of  6  by  piecing  together  patches  of 
hyperplanes  along  the  level  surface  {y  e  ]RP  :  yT($(0)  ^-A(0))y  =  2} 
(cf.  Figures  4.1  and  4.2).  Furthermore,  since  $(0)  and  A(6)  are 
continuous  in  0,  R(0)  may  be  chosen  "continuously"  in  0: 


34 


In  particular,  it  may  be  assumed  that  there  is  a  finite  integer  m 
such  that  for  each  0  eK,  R(0)  has  no  more  than  m  distinct  faces. 


yT(t(0)_1-A(0))y=2 


Figure  4.1  Figure  4.2 


Each  polyhedron  R(0)  separates  IRP  into  a  bounded  component 

and  an  unbounded  component,  which  will  be  denoted  RINT(0)  and 
EXT 

R  (0) .  Now  Theorem  1  and  a  crude  bound  on  the  tail  of  the  cumula¬ 
tive  normal  distribution  function  imply  that  for  b<a<ca  ,  0  eK 

n  * 

and  sufficiently  large  n 

(Sn  -  np@)n  a  ^  E  R^XT(0)}  £  2ra  •  exp{-a^/ (1  -  6/2) } 

But 


f 

I  : 

<4'10>  Ee  ^(n.e)  l(8e(?(9)_1/2(Sn -We)n"1/2)  >b) 

•  ge<$(e)"1/2(Sn-nue)n"1/2) 

<  Z  Pfl{(S  -npfl)n~1/2  e(log1/2  b)(l  -6/4)~k/2  REXT(0); 

_k=0  -i 

|Sn-nlieln'1/2<an1  ■] 

•  max{g0($(0)  2^2  y)  : 

y  e (log1/2  b) (1  -  6/4)-k/2  REXT(9) ;  .  f  j 

y  t  (log1^2  b)(l-6/4)-(k+1)/2  REXT(0)}  I 


<  2m  Z  exp{(log  b)[(l -6/4)-(k+1)  -  (1 -6/4)  k(l  -6/2)"1]} 
k=0 


for  sufficiently  large  n,  and  all  9  c K.  The  series  on  RHS  (4.10)  can 
be  made  arbitrarily  small  by  choosing  b  large;  this  proves  (4.9), 

an/1  f  ^»ne  f  A  7^  tit 


5.  Expansion  of  likelihood  Functions 


Let  N  be  a  smooth,  compact,  r-dimensional  submanifold  of  T, 
and  let  f(*)  be  a  smooth,  strictly  positive  probability  density  on  N 
(with  respect  to  the  "volume  element"  measure  a(*))*  Define 

(5.1)  Q(A)  =  /N  Pg(y)(A)f(y)a(dy)  ; 


thus  Q  is  a  probability  measure  on  the  a-algebra  ). 

Furthermore,  the  measures  Pq  and  Q,  when  restricted  to  the  o-algebras 
3(X^,...,Xn)  (these  restricted  measures  will  be  denoted  and 

respectively),  are  mutually  absolutely  continuous,  and 


(5.2) 


dP^n)/dQ(n)  = 


0(y)TS  -niK0(y)) 
e  f(y)o(dy) 


The  objective  of  this  section  is  the  derivation  of  a  more  tractable 
expression  for  dP^VdQ^11^  when  Sn/n  is  near  N,  as  n  -*■  °°. 

It  will  be  convenient  to  have  some  notation  available  for 
various  matrices  which  will  occur.  Recall  that  the  tangent  space 
TN (y)  to  N  at  y  is  the  vector  subspace  of  IR*5  defined  by 


(5.3)  TN(y)  =  {ve  IR^  :  3  smooth  g  :  [-1,1]  N 

with  g(0)  =  y  and  g'(0).*  v)  ; 

thus  TN(y)  is  an  r-dimensional  vector  space,  for  every  yeN.  Let 
— TN(y)  ^enote  t*ie  orthogonal  projection  operator  from  IR^  to  TN(y), 
and  let 

(5.4)  Hl(y)  -  *(0( y))"1  PTO(y)  ^(y)'1  PTO(y)  ^e(y))"1 


1 

I 


(: 

i 

i 

I 


i 


i- 

r 


i 
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,.W 


■  ii  *>*«**» 


(5.5) 


"jW  ‘  £ra<y>  4WW)'1  £ra(y)  ►  ™(y)  . 

Note  that  ^(y)  considered  as  an  operator  on  TN(y)  is  invertible, 
since  $(0(y))  is  invertible;  furthermore,  since  TN(y)  is  vector- 
space  isomorphic  to  ®r,  H2(y)  may  be  interpreted  as  an  operator  on 
!Rr,  and  det  H^Cy)  is  then  unambiguously  defined. 

PROPOSITION  1.  Suppose  y^  e  N,  and 

(5.6)  S^/n  =  + 

where  |h|  <  Sn  '.  Then  as  n  -*■  00 

i  \  t  \  ~n$(S  /n)  ,  . 

(5.7)  dpJn;/dQCn)  ~  e  n  f(y1)“  (n/2iT)r/2 

•  exp{hT($ (9(y1))  1  -  H1(y1))h/2) 

•  det(H2(yi))1/2  . 

This  relation  holds  uniformly  for  y^  Z  N  and  |  (S^/n)  -  y^|  <  6n^^  *^2 , 
for  every  6  >  0. 

The  proof  is  a  relatively  straightforward  exercise  in  the  use 
of  Laplace's  method.  The  basic  idea  is  that  for  large  n  the  only 
part  of  N  which  contributes  to  the  integral  in  (5.2)  is  a  small  patch 
around  y^  (essentially  of  radius  n^2  ^2  log  n) ,  and  that  the  inte¬ 
gral  over  this  patch  is  approximately  equal  to  the  integral  over  the 
tangent  space  at  y^.  Because  N  is  compact,  all  of  the  errors  are 
uniformly  small. 
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\i  ‘*G>i 


Since  the  relation  (5.7)  is  crucial  to  all  that  follows,  the 


argument  will  be  given  in  detail.  Taylor's  Theorem  yields 

(5.8)  0(y)Tx  -  <K9(y))  -  <Kx)  =  (y-y^1  |(0(y1))"1  (x-y-^ 

-  j  (y  -yL)T  t(0( y^)-1  <y  -y^ 

-  \  (x-y^1  $(9(y1))_1  (x-yx) 
+  0(|yx  -y|3)  +  0(|x  - yx | 3) 

by  way  of  the  identities 

(5.9)  yL  =  ?Q  ^(9(yi)) 

@(y]L)  *  vy  HyJ 

t(e(yi))_1  -  4>(Yl)  . 

Because  N  is  compact,  the  remainder  terms  in  (5.5)  are  uniformly 

* 

small,  i.e.,  there  exist  C,6  >  0  such  that  whenever  y^yeN, 

a  a 

|y  - yxl  <5  .  and  |x -y±\  <  6  , 

(5.10)  o(|x-yi|3)  <  c  1  x ~ yx  1 3 

ody-yj3)  <  Cly-yJ3  . 
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Relation  (5.8)  may  now  be  used  to  integrate  over  the  set  of 
yeN  which  are  within  n^"77  log  n  of  y^.  If  xeT,  and 
|x-y^|  <  6n^7  then 

(5.11)  /N  exp(n[0(y)T  x-^(0(y))  -  4>(x)  ]} 

l(|y  -yj  ln1//7  1//2  log  n)  f (y)a(dy) 

~  exp(-n(x  - y^) T  |(0(y^))  1  (x-y1)/2) 

*  f(yp 

*  /N  exp{n(y -yj^)1  $(0(y1))_1  (x-y^) 

'  expf-n(y  -  yx)T  |(0(yi))-1  (y  -  yL) /2) 

If |y  -  y^l  l”1^7  172  log  n)  a(dy) 

~  exp(-n(x  ~y^)T  |(9(y1))  1  (x-y^/2) 

*  f(yi) 

'  ^TJKyj)  expfn  yT  t <0 (yx) )  1  (x-yx)} 

•  exp{-nyT  ? (© (yx) ) _1  y/2 }  ra(dy) 

where  m(dy)  denotes  the  Lebesgue  measure  on  TN(y^) .  Moreover,  the 
last  relation  is  valid  uniformly  for  y^  cN  and  | x  —  y |  £<Sn^7  , 

since  the  curvature  form  of  N  at  y^  is  uniformly  bounded 
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The  last 


(in  operator  norm)  for  eN  (because  N  is  compact), 
integral  may  be  evaluated  exactly  by  completing  the  square,  leaving 

(5.12)  /N  exp{n[0(y)T  x  —  Tp(9 (y) )  —  4>(x)  ] } 

l{]y  -  |  £n1//7  1//2  log  n}  f  (y)a(dy) 

~  exp{-n(x  -  y1)T  $(0^))  1  (x-y^/2) 

•  f(yx)  *  (27T/n)r/2  •  (det  H2(yL))"1/2 


•  exp{n(x-y^)T  H^(y^)  (x  -  y^)/2)  . 

Note  that  since  |x-y^|  <<Sn_1^2+1^7  the  last  quantity  is 

2/7 

never  smaller  than  exp(-C'  n  '  }.  Thus  to  complete  the  proof  of 
(5.7)  it  is  sufficient  to  show  that  the  integral  over 
{y  eN  :  |y-y^I  >n^7  ^2  log  n)  is  of  smaller  order  of  magnitude. 
Now  the  function 


u(y)  =  0(y)T  x -iK0(y))  -  <Kx) 


is  a  nonpositive  function  of  ye  T  whose  only  zero  is  at  y  -  x.  The 
Taylor  series  expansion  (5.8)  for  u(y)  shows  that  there  exist  con¬ 
stants  C  >  0  and  n^  such  that  for  n  ^  n^. 


>  n1/7“1/2 


log  n 


|x  -yj  £  <5n 


1/7— 1/2 


and 


yx  eN 
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.  - 1  fiMMhn  m" 


a-' 


Imply 


(5.13)  d(y)T  x-ip(d(y)) -4>(x)  ±  -C  n1  +  2/^7log2n  . 

Consequently, 

(5.14)  /N  exp{n[<9(y)  |x^>-^(6(y))  -  4>(x)  ]} 

1 ^ 1 y  -  I  >n1^7  1^2  log  n)  f(y)a(dy) 

< _  exp{-C  n2^7  •  log2  n) 

This  completes  the  proof  of  (5.7).  Ill 
Let 

Mq  =  (x  e  IRP  :  <|>(x)  <  and  uT  x  =  o  Vue  T^0(0)} 

Suppose  that  N  Is  a  smooth  compact  r-dimensional  sub-manifold  of 
Mq  f|  rL  with  boundary,  e.g., 

N  *  (x  E  Mq  n  rL  :  ^  £  4>x (x)  -  4>q(x)  1  ^2  <  * 

It  is  possible  to  mimic  the  preceding  analysis  to  obtain  asymptotic 
expressions  for  likelihood  functions,  but  only  when  Sn/n  is  not  too 
near  the  boundary  3N  :  near  3N,  N  is  not  well  approximated  by  its 
tangent  space,  but  instead  by  a  half-space  of  its  tangent  space. 

As  before,  let  f(>)  be  a  smooth,  strictly  positive  probabil¬ 
ity  density  on  N  with  respect  to  the  volume  element  measure  a(*)»  and 
let 

Q(A)  -  /N  Pg(y)  (A) f(y)a(dy)  . 
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PROPOSITION  2 .  Suppose  y1  £  N  and 

(5.15)  Sn/n  =  +  hn  ^2,  and  distfS^/n,  3N)  >n^7  ^2  log  n 

with  |h|  <  Sn7-^7.  Then  as  n  -*■  00 , 

(5.16)  dp£n)/dQ(n)  ~  e  ^  ^  }  f(y1)_1(n/2TT)r/2 

•  exp{hT($(8(y1))  1  -  H1(y1) )h/2> 

•  det  (H2  (y^)  )  . 

This  may  be  rewritten  as 

(5.17)  dp£n)/dQ(n)  ~  e  n  f(y1)“1(n/2Tr)r/2 

•  exp(hT($(8(y1))  1  -H1(y1))h/2} 

•  exp{-hT  H.j(y.^)h/2} 

•  (det  H2(y1))1^2 

where 

(5.18)  H3(y)  =  vj  <P(y)  -  vj  <^(y)  +  V2  <frQ(y)  . 

Relations  (5.16)  and  (5.17)  are  valid  uniformly  on  the  event 
(dist(Sn/n,  N)  <  6n1/7_1/2,  dist(S  /n,  3N)  >  n1/7~1/2  log  n) . 

Recall  that  *  n(<J)^(Sn/n)  -^^(S^/n))  is  the  log-generalized 
likelihood  ratio  statistic  for  testing  Hg  v.  H^.  It  is  worth  noting 
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a  consequence  of  (5.17):  for  each  compact  K  c  N\9N  and  each  6  >  0, 
there  is  a  C  >  0  such  that 


dist(Sn/n,  K)  <  Sn"1/2  log  n 

implies 

(5.19)  dP^n)/dQ(n)  <  e  n  exp{C  log2  n)  . 

The  proof  of  (5.16)  is  essentially  the  same  as  the  proof  of 
(5.7),  and  (5.17)  follows  simply  from  (5.16):  note  that 

n<J>(S  /n)  -n<{)1  (S  /n) 
n  In 

=  (l/2n)(Sn-ny1)'V  My^-V  ^(y^  ]  (sn  -  ny^ 

+  0(n~2|Sn-nyi|3) 

and 

n<^0^Sn^n^  “  (1/2nHSn -ny1)T  V2  ^(y^  (Sn  -  ny^  +  0(n  2|Sn~ny1|3) 

since  <p(y^)  »  ,  Vy  ^(y^  =  Vy  ^(y^  *  § (y1) ,  ^(y^  =  °»  and 

Vy  4>Q(y^)  ”  0  for  y1  cMQ  0  T^.  That  the  0(*)  terms  are  uniformly 
small  follows  from  the  compactness  of  N.  /// 
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6.  The  Collapsing  Argument 

The  limiting  constants  In  the  conclusion  of  Theorem  2  occur 
as  integrals  over  certain  submanifolds  of  the  mean  parameter  space : 
this  corresponds  to  the  fact  that  the  bulk  of  P_(A  >n  {)  and 
PQ{T£a  is  accounted  for  by  sample  paths  for  which  SR/n  and  ST/T 

are  near  the  critical  manifolds.  The  purpose  of  this  section  is  to 
prove  that  the  probabilities  in  question  actually  do  shrink  to  inte¬ 
grals  on  the  critical  manifolds  (hence  the  term  "collapsing 
argument":  it  is  not  meant  to  suggest  any  structural  deficiency  in 
the  proof  itself) . 

Let 

T  *  {inf  n>a€l1:  A  >a) 
a  —  /  n 

Mg  -  {xeTftg(O)  :  <Kx)  <  €g) 

Ng  =  (xeMg  0  T1  :  4>(x)  -^(x)  »  6  }  ; 

=  {x  eMg  :  <|>(x)  =  €  ) 


PROPOSITION  1.  Assume  that  0  <  €,  <  €g  and  0  <  Then 

for  every  6,  k  >  0 


(6.1) 


P_{A  >n€;  dist(S  /n,N_)  >  6n 
U  n  —  n  fc 


-1/2 


log  n) 


,  -k  -n  £. 
o(n  e  ) 


as  n  -*■ 


and 


(6.2)  PQ{Ta  fa^1;  dist(ST/T,  Na/T)  ><$a~1/2  log  a)  -  o( 


-k  -a. 
a  e  ) 


as  a  -*■  00 


The  proof  will  proceed  by  a  series  of  crude  estimates  based 
on  a  likelihood  ratio  identity.  The  first  step  is  to  show  that  only 
those  sample  paths  for  which  the  sample  means  S^/n  fall  in  a  certain 
compact  subset  of  Kp  are  of  any  consequence. 


LEMMA  1.  There  is  a  compact  set  K  C  P  such  that  xcK  implies 


<J>(x)  <  also 


(6.3) 


(x  e  T  :  <}>(x)  £max(G,  G^)}  c  Kc 


(6.4) 


-mC 

P_{3  n  >  m  :  S  /n  t K)  =  o(e  ) 
u  —  n 


as  m  -*■  »,  for  some  G^  >  max(G,  G2) 
PROOF .  Choose  G^ »  so  that 


Then  the  sets 


.(e.t,)  <  e3  <  €4  <  ^ 


K3  ■■  {x  c  r  :  4>(x)  £  G^) 


>  (xeT  :  <J>(x)  <  G^i 


are  compact  (this  is  the  reason  for  condition  1  on  G^) ,  and  is 
contained  in  the  interior  of  K^.  Thus  there  is  a  compact  K  contained 


:  x  .  - .  w-  mn- 
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in  the  interior  of  and  containing  in  its  interior,  whose 
boundary  3K  is  a  polyhedron  (i.e.,  K  is  the  intersection  of  finitely 
many  half-spaces) .  The  existence  of  K  may  be  deduced  from  the 
finite-dimensional  Krein-Milman  Theorem,  the  compactness  of  3K^,  and 
the  convexity  of  <j>. 


Figure  6.1 


Now  Chernoff's  large  deviation  theorem  for  random  variables 
(cf.  [4])  gives  exponential  bounds  for  P^iS^n  t  H)  where  H  is  any 
half-space;  since  K  is  the  intersection  of  finitely  many  half-spaces, 
and  since 

CD 

Pn{3  n  >m  :  S  /n  tfK)  <  l  P{S/ntfK)  , 

U  —  n  —  n 

n=m 

(6.4)  follows  easily.  /// 

Let  f(*)  be  a  smooth  probability  density  on  T  (with  respect 
to  Lebesgue  measure  on  TsP)  which  is  strictly  positive  on  K.  Define 
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(6.5)  Q(A)  =  /p  Pg(y)  (A) f (y) dy  ; 


then  and  (the  restrictions  of  PQ  and  Q  to  the  o-algebra 

3(X^, . . . .X^))  are  mutually  absolutely  continuous,  and 


(6.6) 


dp(n)/dQ(n)  = 


0(y)TS  -n<K0(y)) 
e  f(y)dy 


LEMMA  2. 

(6.7) 


There  is  a  constant  C  >  0  such  that 


[d^(n)/dQ(n) ]  IIS  /n  c k}  <  C-n2p 
u  n  — 


-n(KS  /n) 


PROOF.  To  obtain  an  upper  bound  for  (6.6),  one  may  replace  the 

2 

domain  of  integration  F  by  a  p-dimensional  cube  of  side  1/n  centered 
at  S^/n.  Since  K  is  compact  and  f(*)  has  a  strictly  positive  minimum 
on  K,  (6.7)  follows  routinely.  /// 

To  prove  Proposition  1  it  now  suffices  to  show  that  there  is 
a  constant  0  >  0  such  that  if  x  eK,  £  b  £  and 

(6.8)  4>1(x)  -  4>g(x)  >  b 

and 

(6.9)  dist(x,  N^)  >  6n  log  n  , 
then 

(6.10)  <Kx)  >  Bn  ^  log^  n  +  b 
For  then  Lemma  2  would  imply 
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PROOF.  First  recall  that  <f>  -  $  is  a  no. .-negative  function  of  x  e  T 
which  is  zero  only  for  xeT^:  consequently,  it  suffices  to  consider 
only  those  xeK  for  which 

(6.15)  dist(x,  Tj)  <  n  , 

where  n  is  a  small  positive  number  of  our  choosing.  Now  n  may  be 
chosen  so  small  that  for  xeK  satisfying  (6.15)  x  e  and  the  MLE  map 

A 

0^  is  submerslve  in  a  neighborhood  of  x  (cf.  Lemma  1,  Section  3),  and 
also  small  enough  that  for  xeK  satisfying  (6.15)  the  two-term  Taylor 
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series  for  (<j>  -  0^)  around  yg  ^  is  accurate  enough  that 
(6.16)  <j>(x)  - ^(x)  (x) ^ 


+  <*-ue1(x))  vw'h>(1,61(x)) 


+  (1/4)  (x  -  M§i(x) ) T  72  (*  -  ^)  (P^  (x) )  (x  -  (x) ) 


Since  y^  ,__x  e  Fn  it  follows  from  Lemma  1  of  Section  3  that 


Q^x)  “‘l 


*(w0  (x)>  =  <J>i(ue1(x)) 


and 


^(Mft  *  6(X) 


“S^x)'  '  ’^'^(x)' 


Thus  to  prove  (6.14)  it  suffices  to  show  that  for  xeK  satisfying 
(6.15)  there  exists  a  8  >  0  such  that 


<* - “vx/  - *1> (“8l(x>> <*  - ‘■e1cx))  1  *«*“<*•  ri>>2  • 


I 


Since  yg  ^  el^,  it  Is  clear  that 


(6.18) 


X'M0,(x)1  >  dlst(x>  V 


This  proves  (6.14).  /// 

★ 

LEMMA  4.  For  each  6  >  0  there  exist  y  >  0,  6  >0,  and  8  >  0  such 

that  if  xcK,  €£Y,  and 

(6.19)  dlst(x,  Mq)  >  6€ 

but 

(6.20)  dist(x,  rt)  <  6*  , 

then 

(6.21)  $0(x)  >  ee2  . 


I 


PROOF.  <f>g(x)  a  convex  non-negative  function  of  x c K  which  is  zero 

iff  xcMq.  Thus  it  suffices  to  consider  only  those  xeK  for  which 

dist(x.  Mg)  <  n:  here  n  >  0  is  a  constant  of  our  choosing. 

Recall  that  Mg  and  intersect  transversally  whenever  they 

intersect  (this  by  condition  III  in  the  definition  of  £g) .  By 

* 

Lemma  2  of  Section  3  there  exist  6  >  0,  r\  >  0  small  enough  that  if 

x  eK  satisfies  (6.20)  and 

(6.22)  dist (x,  Mq)  <  n  , 
then 

(6.23)  dist(x,  Mq  H  f^)  <  n*  • 


Here  r)  >0  has  been  chosen  small  enough  that  if  x £ K  satisfies 

A 

(6.23) ,  then  xeU^  and  the  MLE  map  8^  is  a  smooth  submersion  in  a 

neighborhood  of  x,  and  in  addition  the  two-term  Taylor  series  for  <j>g 
around  P  j_  (x)  is  accurate  enough  that 

ra0(o) 

(6.24)  <f>Q(Px)  +(x-Px)T  7<J>0(Px)  +(1/4)(x-Px)T  V2  4>Q(Px)  (x  -  Px) 

=  (1/4)  (x -]?x)T  V2  4>0(Px)  (x  -  JPx)  £<J>0(x) 

.  ± 

(P.mP  j_  denotes  the  orthogonal  projection  onto  TT2  (0)  ). 

TO0(0)  0 

By  Lemma  1  of  Section  3  V2<J>g(y)  is  strictly  P.D.  on  Tftg(O) 
for  any  y  such  that  0g(y)  =  0;  since  x-Px  t  TEJ^O)  the  lemma  follows 
from  (6.18)  and  an  obvious  compactness  argument.  /// 

'k 

LEMMA  5.  For  every  6  >  0  there  exist  6  >  0,  8  >  0,  and  y  >  0  such 

that  if  €  £  Y,  and  xeT  satisfies 

(6.25)  dist (x,  MQ  fl  1^)  <  6*  €  , 

(6.26)  dist(x,  N,  )  >  5  € 

b 

and 

(6.27)  <Kx)  >  b  , 
then 

(6.28)  <J>(x)  >  b  +  86  . 

Here  «  (ye  Mg  fl  F^  :  <p(y)  *  b) ,  and  (6.28)  holds  for  all  b  such  that 
6^  <  b  £  62  (f°r  the  same  8). 
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PROOF.  It  follows  from  Lemma  2  of  Section  3  together  with  condition 

*  * 

III  in  the  definition  of  6^  that  there  exist  <S  ,  8  ,  and  y  such  that 
if  xeT  satisfies  (6. 19) -(6. 20)  ,  then 

(6.29)  dist(x,  {ye  T  :4>(y)*h))  >  8*€  . 

If  x  also  satisfies  (6.27),  then 

(6.30)  dist(x,  {y  e  F  :  4> (y)  _<  b)  >  M  . 

Relation  (6.28)  follows  directly,  since  Is  nonzero  on  the  level 
surface  (y  e  V  :  4>(y)  *  h} .  That  it  holds  uniformly  for  6^  ~  b  ^ 
follows  from  an  easy  compactness  argument.  Ill 

It  now  follows  from  Lemmas  3-5  that  if  x e K  satisfies 
(6.8)-(6.9),  then  it  satisfies  (6.10)  (since  <j> >.  4^ >, <l>0  > 0)  .  This 
proves  Proposition  1,  /// 
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7.  Proof  of  Theorem  2 


The  proof  of  Theorem  2  will  be  based  on  the  likelihood  ratio 

identity 

(7.1)  P0(A)  =  ^[Eg(y)  1A  LT]aM(dy)/a(M) 
where 

(7.2)  M  *  M(€5,  €6;  0)  =  {yeri  fl  TfiQ(0)’1"  :  €5  £<t>1(y)  -  <f>0(y)  £ , 

(7.5)  0  <  S  <  \  <  ^2  <  €6  <  <b  ’ 

(7.4)  Lt  *  d(PQ  h  3T)/d(Q  h  3t)  , 

and 

(7.5)  Q(B)  -  fH  Pg(y)  (B)0M(dy)/0(M)  . 

Recall  that  M  is  a  compact  manifold-with-boundary ,  so  its  total  sur¬ 
face  area  o(M)  is  finite.  The  relation  (6.1)  holds  for  all  events 
(3^  is  the  "stopped"  sigma  algebra  of  events  A  for  which 
A  fl  {T£n)  £  3(X^, . . .  ,Xr) }  for  every  n)  and  (7.5)  holds  for  all 
B  e  3(X1,X2,...  ). 

According  to  Proposition  1  of  Section  6, 

(7.6)  P^T^a^1;  dist(ST,M(£L,  ^jO))  >  6  •  a~1^2  log  a)  -  o(a-k  e“a) 
as  a  +  ®,  for  all  k,6  >  0.  Consequently,  it  suffices  to  show  that 


(7.8) 


loga  >  dist(ST/T,  MC^.^jO))}  . 


A  -  (T  ;  a~1/2 

The  plan  of  the  proof,  then,  will  be  to  evaluate  (7.1)  for  A  defined 
In  (7.8)  by  exploiting  the  asymptotic  formula  for  1A  L,p  provided  by 
Proposition  2  of  Section  5. 

NOTE:  Throughout  the  rest  of  this  section,  A  will  be  the  event 
defined  by  (7.8). 

The  manifold  M  divides  neatly  into  three  zones,  in  each  of 
which  the  integrand  behaves  differently.  These  are 

(7.9)  ^  -  {yeM:  q+a-1^  +n  <  ^(y)  -  <f»Q(y)  <  £,  -  a~1/2  +n) 

^  -  {yeM:  <t>1(y)  -  <J>0(y)  £  6^  +  a-1^2  +n  or 

4>x(y)  -<J>0(y)  t  ^  _a~1/2  +n^ 

fl  {yeM:  dist(y,M(€I,  ^  ;0))  <  a_1/2  +n) 

•*  (yeM:  dist(y,M(€^,  Cj  ;0))  _>  a-1^2  +f|) 

where  0  <  n  <  1/32  is  some  fixed  constant.  It  is  clear  that 

M1  +  and  that  ^  +  3M(€1,^;0)  (thus  o(M2)  +  0  as  a  -*■  ») . 

It  will  develop  that  /u  ~  RHS  (3.22)  and  that  and  are  of 

"l  "2  M3 

smaller  order  of  magnitude. 


For  yeM,  define 


(7.10)  -  n1(a,y)  -  |[  (a/<4>1<y)  -  <J>  (y)))  -  a1^2  +n  ]| 

n2  “  n2^a,y^  "  ^  (a/t'J’iCy)  -<}>0(y)))  -a1^2  +  n^2  1 

n3  -  n3(a,y)  -  C  (a/(<j>1(y)  -  4>Q(y> ) >  +a^2  +n/^2  3 

and 


(7.11)  £n(y)  «  (Sn -ny)T(t(0(y))  1  -  H^y)  -  H3(y))  (S^  -  ny)/2n  , 
where  H^(y)  and  H3(y)  are  as  in  (3.26)  and  (3.28). 

LEMMA  1.  As  a  +  ”, 

(7.12)  max  Pg^  {|  Sn  -  ny  |  >C  a^^2+^,  some  n  e  [a€^\  a€^]} 


o(e-a  ) 


n/4 

(7.13)  max  PQ(y) ^Ta  i  [n,(a,y) ,  n^(a,y)]}  -  o(e”  ) 


ycM. 


(7.14) 


max  -^y^l  >  c/lo8  a»  8OTae  ne  [n-,n  ]i 

yeM  n  1  z  J 


1/16 

-  o(e"a  )  , 


and 


(7.15) 


max  p0(y) ( I (Sn/n)  -  (Sn  /n^ |  >  C  a-1^2  log  a) 


1/32 

o(e"a  )  , 


for  all  C  >  0,  0  <  8  <  1/6. 
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This  is  a  routine  consequence  of  Corollary  1,  Section  4;  the 
proof  is  left  to  the  reader. 

Next  define  events 

(7.16)  A  -  { i  (S  /n.) -y|  <a"1/2+n} 

y 

A*  -  {  |  (S  /n  )  -(S  /T)  |  <a_1/2  logs) 

y  ^  1 

n  { I C  (y)  -  ST(y) i  <  1/log  a) 
nl 

£  £  A 

Ay  -  Ay  n  (Ta  e  [n2(a,y),n3(a,y)  ]}  . 

By  Proposition  2 ,  Section  5 

(qi-qn)/2 

(7.17)  Lt1a1a  1a*  ~e  1  exp(Cn  (y)}(T/2TT) 

y  y  i 

•  (det  H2(y))1/2  o(M)1a1a  1a* 

y  y 

and 

(7.18)  L  (A1A  IA„  -e  T.xp(c  W)a(«)lAlA  JA„ 

y  y  i  y  y 

(Vqo)/2 

•  (a/2TT(^(y)  -$Q(y)))  1 

•  (det  H2(y))1/2  ; 


furthermore,  these  relations  are  valid  uniformly  for  ycM  and  uni¬ 
formly  on  the  events  Ay  D  A*,  and  Ay  fl  A**,  respectively. 
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LEMMA  2 .  As  a  +  « 


(7.19)  PS(y)t|«8(v)tlAlA**  «  T‘)|3„1(a,y))-'’Wl>d-0 

for  every  €  >  0  and  every  yeMf^,^;*))  such  that  the  random  variable 

(^^(y) -4>0(y))  +  (x1-y)T  v(4>1  - 4.Q) (y) 


has  a  non lattice  distribution  when  X,  -  Pg,  .  (cf.  Lemma  4, 

1  o(y)  ’ 

Section  3) . 

The  o-algebra  5  is  the  one  generated  by  X  .  It 

nl  1  nl 

should  be  noticed  that  the  convergence  indicated  by  (7.19)  need  not 

be  uniform  in  y.  Fortunately,  the  rv's  are  bounded. 

Lemma  2  is  very  much  related  to  the  nonlinear  renewal  theorem 

of  Lai  and  Siegmund  [9].  Although  the  statement  of  their  theorem 

does  not  imply  Lemma  2,  their  proof  does:  in  fact,  they  obtain  an 

unconditional  limit  theorem  by  first  proving  a  conditional  statement, 

which  in  our  case  becomes 

-(Va> 

(7. 2°)  Eg(y)[e  \3  }-v(y)+0  A'S-(Pe(y))  • 


Since  1^  1^**  "*■  1  (cf.  Lemma  1)  A.S.  (Pg^),  (19)  follows  from  this. 

The  key  to  evaluating  Eg,  .  1A  1^  1^**  L^  is  now  provided  by 

y  y  y 

Corollary  2  of  Section  4.  This  allows  that  uniformly  for  y£M, 


(7.21)  E: 


0(y)  XA  mxp\(mty)}*fdet  *(§(y))  ’  (Hl(y)  +  H3(y))) 


-1/2 
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as  a  ■+•  00 .  Consequently  by  (7.18),  (7.19)  and  the  Dominated 
Convergence  Theorem 


(7.22) 


■(q^o)/2 


6  ^(yJ^AfW  Ok**  LT^ 

y  y 


oM(dy)/oM(M) 


C(€1,€2;0)  . 


Recall  from  Proposition  2  of  Section  5  (5.19)  that 


(7.23) 


-A 

T  2 

1  L  _<  e  exp{C  log  n) 

A  X 


for  some  constant  C.  By  Lemma  1 


max 

yeMj^ 


P0(y) 


(A\(A 

y 


n  a**)) 
y 


n/4 

o(e'a  )  ; 


thus 


~<qrq0)/2  a 

(7.24)  /  a  '  m,**)  41  V«V 


y  y 


-*•0  as  a  •+■  00 


For  yef^,  (7.17)  and  a  compactness  argument  show  that  there 
★ 

is  a  constant  C  >  0  such  that 

~(ql_q0)/2  a  .  <  C*  expU  (y))l. 

a  e  LT  lkOk  Ck*  ~  ni  \ 

y  y 

and  (7.21)  implies  that  there  is  a  constant  C**  >  0  such  that 


-<qrqo)/2  »  ** 

E0(y)  3  ®  LT  1AfW  Ok*  -  C 

J  y  y 
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This,  (7.23),  and  the  fact  that 


.nM 


max  <A\(A  HA*))  *  o(e  3  ) 

y£M2  ky;  y  y 


imply 


(7.25)  I  a 

**2 


■(<lrV/2  -a 


e  E6(y)  LT  XA  aM(dy)/0M(M)  "  °  ’ 


since  ^(fr^)  ■+  0. 

Finally,  for  yeM^, 

(7.26)  A  n(  |Sn/n -y|  <a“1/2  +n/2  for  all  n  £  [a^1  .a^1  ]} 


By  Lemma  1 


r  I  .  I  “1/2  +n/2  -  r  —  1  -1,1 

max  P 2,  ,  i|S  /n  -y  >a  for  some  n  £  la£,  ,afc1  J) 

y£M3  0(y)  n!  1  ^  1 


n/4 


o(e  )  ; 


this  (7.23),  and  (7.26)  imply 


"(ql"q0)/2  a 

(7‘27)  'M  a  e  E0(y)[lA  LT]  °M(dy)/cfM(M)  +  0 


The  relations  (7.1),  (7.22),  (7.24),  (7.25),  and  (7.27)  prove 
Theorem  2.  /// 
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